home *** CD-ROM | disk | FTP | other *** search
Wrap
rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) NNNNAAAAMMMMEEEE MMMMeeeemmmmoooorrrryyyy RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss - Analysis of memory access patterns DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN The Origin 2000/200 hardware provides memory reference counters to assist application programmers in tuning their algorithms for optimal performance on a NUMA system. These counters are capable of unveiling the exact memory reference patterns exhibited by an application or a specific algorithm, enabling the programmer to optimize the application data layout and to provide specific memory placement hints to the Operating System in order to maximize cache utilization and locality of memory access, therefore achieving best memory access performance. Note that this is an Origin 2000/200 capability only, and does not apply to other Origin platforms. IIIIMMMMPPPPLLLLEEEEMMMMEEEENNNNTTTTAAAATTTTIIIIOOOONNNN HHHHaaaarrrrddddwwwwaaaarrrreeee RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss Origin 2000 and Origin 200 systems provide a set of counters for every 4 KB hardware page of memory. The number of counters per set depends on the number of nodes in the system: for systems with less than 64 nodes (that is 128 processors) a counter set has one counter per node, and for systems with more than 64 nodes a counter set has one counter for every 8 nodes. For systems with 64 or less nodes, each counter in a counter set counts the numbers of references from each of the nodes. Thus, the application programmer can tell exactly how many references have been issued to a page from each node in the system. For systems with more than 64 nodes, each counter in a counter set corresponds to the number of references to a page issued by a group of 8 nodes. Note that a hhhhaaaarrrrddddwwwwaaaarrrreeee ppppaaaaggggeeee is not equivalent to a bbbbaaaasssseeee ssssooooffffttttwwwwaaaarrrreeee ppppaaaaggggeeee (or just ppppaaaaggggeeee). A hhhhaaaarrrrddddwwwwaaaarrrreeee ppppaaaaggggeeee defines the granularity at with the hardware does reference counting and other hardware operations; a base software page is the smallest unit of memory that can be mapped by user processes via the Translation Look-aside Buffer or the Page Tables. For Origin 2000 and Origin 200 systems a hardware page, and therefore the memory reference counting granularity, is 4 KB; and a base sofware page is 16KB. For example, consider an 8 node (16 cpu) Origin 2000 system with the memory configuration shown in the table below. This table shows the number of hardware pages (equivalent to the number of counter sets), the number of total counters, and the number of base software pages per node. For this configuration of 8 nodes, each counter set has 8 counters (one per node). PPPPaaaaggggeeee 1111 rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) Memory Configuration Hardware Counter Total Base Memory Pages Sets Counters Software Pages Module Slot [bytes] Mem/4Kb 1/Hpage 8*Sets Mem/16Kb 1 n1 512M 128K 128K 1024K 32K 1 n2 256M 65K 65K 512K 16K 1 n3 256M 65K 65K 512K 16K 1 n4 512M 128K 128K 1024K 32K 2 n1 256M 65K 65K 512K 16K 2 n2 64M 16K 16K 128K 4K 2 n3 64M 16K 16K 128K 4K 2 n4 256M 65K 65K 512K 16K The length of each counter also depends on the system configuration. For systems with more than 16 nodes (32 cpus), the counters have a length of 19 bits (maximum count is 0x7ffff). For systems with less than 16 nodes, the length of the counters depends on the the kind of directory SIMMS installed on the machine. If SSSSTTTTAAAANNNNDDDDAAAARRRRDDDD SSSSIIIIMMMMMMMMSSSS are installed, then the counters are 11-bit (maximum count 0x7ff); if PPPPRRRREEEEMMMMIIIIUUUUMMMM SSSSIIIIMMMMMMMMSSSS are installed, then the counters are 19-bit. SSSSooooffffwwwwaaaarrrreeee EEEExxxxtttteeeennnnddddeeeedddd RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss The hardware counters peg when they reach their maximum count. This is a problem for the 11-bit counters that would peg after only 0x7ff (2047) references to a page from one node. To allow application programmers to keep track of memory references beyond this small number, Cellular Irix provides SSSSooooffffttttwwwwaaaarrrreeee EEEExxxxtttteeeennnnddddeeeedddd MMMMeeeemmmmoooorrrryyyy RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss. The Extended Counters are implemented as an array of 32-bit counters that closely mirror the hardware counters, extending their maximum count to 2^32. The hardware counters are setup in such a way that they send an interrupt when they reach a threshold close to the maximum count. When this interrupt is received by the operating system, the current hardware counter count is added to the corresponding software extended counter mirror, and the hardware counter is reset to 0. This uuuuppppddddaaaatttteeee procedure is performed for complete counter sets, that is, when we receive the overflow interrupt we not only update the counter that is overflowing, but also all the other counters in its set. IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE EEEEnnnnaaaabbbblllliiiinnnngggg RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnnttttiiiinnnngggg To enable reference counting for a section of virtual memory within an application, the programmer can use a Policy Module (mmci(5)) with the migration policy set to "MigrationRefcnt". HHHHaaaarrrrddddwwwwaaaarrrreeee RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss The hardware reference counters for a section of an address space can be accessed using pppprrrrooooccccffffssss (proc(4)). The ioctl command code used for this PPPPaaaaggggeeee 2222 rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) purpose is PIOCGETSN0REFCNTRS. The third argument is used to specify both the virtual address space range we need the counters for, and the buffer where the system should copy the counter values to. This argument is of type sn0_refcnt_args_t, as defined in <sys/SN/hwcntrs.h>: typedef struct sn0_refcnt_args { caddr_t vaddr; long len; sn0_refcnt_buf_t* buf; } sn0_refcnt_args_t; The first field vvvvaaaaddddddddrrrr is the base of the virtual address space range, the field lllleeeennnn is the corresponding length in bytes, and the field bbbbuuuuffff is a pointer to a user buffer where the system will store the counter values and additional information. This buffer is an array of elements of type sn0_refcnt_buf_t, where each element corresponds to the counter information associated with one hardware page: typedef struct sn0_refcnt_buf { sn0_refcnt_set_t refcnt_set; __uint64_t paddr; __uint64_t page_size; cnodeid_t cnodeid; } sn0_refcnt_buf_t; The field rrrreeeeffffccccnnnntttt____sssseeeetttt contains the set of counters associated with the virtual address passed via ssssnnnn0000____rrrreeeeffffccccnnnntttt____aaaarrrrggggssss, ppppaaaaddddddddrrrr is the address of the physical page associated with this virtual address, ppppaaaaggggeeee____ssssiiiizzzzeeee is the page size being used to map it, and ccccnnnnooooddddeeeeiiiidddd is the physical page home node, expressed in terms of _C_o_m_p_a_c_t _N_o_d_e _I_d_e_n_t_i_f_i_e_r_s which can be mapped back to node names using the command topology(1). The rrrreeeeffffccccnnnntttt____sssseeeetttt type is defined by typedef struct sn0_refcnt_set { refcnt_t refcnt[SN0_REFCNT_MAX_COUNTERS]; __uint64_t flags; } sn0_refcnt_set_t; The field rrrreeeeffffccccnnnntttt is the actual set of counters (one counter per node), and ffffllllaaaaggggssss is a state vector reserved for future use. The counters in rrrreeeeffffccccnnnntttt are ordered according to the _C_o_m_p_a_c_t _N_o_d_e _I_d_e_n_t_i_f_i_e_r_s, also known as ccccnnnnooooddddeeeeiiiiddddssss (numa(5)). SSSSooooffffttttwwwwaaaarrrreeee EEEExxxxtttteeeennnnddddeeeedddd RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss The extended reference counters for a section of an address space can be accessed using pppprrrrooooccccffffssss (proc(4)), using practically the same interface defined above for the hardware reference counters. The ioctl command PPPPaaaaggggeeee 3333 rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) code used for this purpose is PIOCGETSN0EXTREFCNTRS (the difference between this command and the command used for the hardware counters is the prefix EXT before the word REFCNTRS). The third argument is used to specify both the virtual address space range we need the counters for, and the buffer where the system should copy the counter values to. This argument is of type sn0_refcnt_args_t, as defined in <sys/SN/hwcntrs.h>: typedef struct sn0_refcnt_args { caddr_t vaddr; long len; sn0_refcnt_buf_t* buf; } sn0_refcnt_args_t; The first field vvvvaaaaddddddddrrrr is the base of the virtual address space range, the field lllleeeennnn is the corresponding length in bytes, and the field bbbbuuuuffff is a pointer to a user buffer where the system will store the counter values and additional information. This buffer is an array of elements of type sn0_refcnt_buf_t, where each element corresponds to the counter information associated with one hardware page: typedef struct sn0_refcnt_buf { sn0_refcnt_set_t refcnt_set; __uint64_t paddr; __uint64_t page_size; cnodeid_t cnodeid; } sn0_refcnt_buf_t; The field rrrreeeeffffccccnnnntttt____sssseeeetttt contains the set of counters associated with the virtual address passed via ssssnnnn0000____rrrreeeeffffccccnnnntttt____aaaarrrrggggssss, ppppaaaaddddddddrrrr is the address of the physical page associated with this virtual address, ppppaaaaggggeeee____ssssiiiizzzzeeee is the page size being used to map it, and ccccnnnnooooddddeeeeiiiidddd is the physical page home node, expressed in terms of _C_o_m_p_a_c_t _N_o_d_e _I_d_e_n_t_i_f_i_e_r_s which can be mapped back to node names using the command topology(1). The rrrreeeeffffccccnnnntttt____sssseeeetttt type is defined by typedef struct sn0_refcnt_set { refcnt_t refcnt[SN0_REFCNT_MAX_COUNTERS]; __uint64_t flags; } sn0_refcnt_set_t; The field rrrreeeeffffccccnnnntttt is the actual set of counters (one counter per node), and ffffllllaaaaggggssss is a state vector reserved for future use. The counters in rrrreeeeffffccccnnnntttt are ordered according to the _C_o_m_p_a_c_t _N_o_d_e _I_d_e_n_t_i_f_i_e_r_s, also known as ccccnnnnooooddddeeeeiiiiddddssss (numa(5)). PPPPaaaaggggeeee 4444 rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) MMMMeeeemmmmoooorrrryyyy MMMMaaaappppppppeeeedddd SSSSooooffffttttwwwwaaaarrrreeee EEEExxxxtttteeeennnnddddeeeedddd RRRReeeeffffeeeerrrreeeennnncccceeee CCCCoooouuuunnnntttteeeerrrrssss The extended reference counters can also be accessed by mmapping them to a user application's virtual address space. This interface is intended to be used by performance tools that provide a global system view rather than a localized process view. This interface is based on a device driver associated with a device that represents the reference counters for each node in an Origin system. Here is the list of reference counter devices for an 8 node system: /hw/module/2/slot/n1/node/refcnt /hw/module/2/slot/n2/node/refcnt /hw/module/2/slot/n3/node/refcnt /hw/module/2/slot/n4/node/refcnt /hw/module/1/slot/n1/node/refcnt /hw/module/1/slot/n2/node/refcnt /hw/module/1/slot/n3/node/refcnt /hw/module/1/slot/n4/node/refcnt To map the counters in a node, a user needs to open the refcnt device for the node, then using the open file descriptor the user needs to obtain information regarding the counters, defined by rcb_info_t in <sys/SN/hwcntrs.h>, using ioctl(fd, RCB_INFO_GET, &rcbinfo). typedef struct rcb_info { __uint64_t rcb_len; /* total refcnt buffer len in bytes */ int rcb_sw_sets; /* number of sw counter sets in buffer */ int rcb_sw_counters_per_set; /* sw counters per set -- numnodes */ int rcb_sw_counter_size; /* sizeof(refcnt_t) -- size of sw cntr */ int rcb_base_pages; /* number of base pages in node */ int rcb_base_page_size; /* sw base page size */ __uint64_t rcb_base_paddr; /* base physical address for this node */ int rcb_cnodeid; /* cnodeid for this node */ int rcb_granularity; /* hw page size used for counter sets */ uint rcb_hw_counter_max; /* max hwcounter count (width mask) */ int rcb_diff_threshold; /* current node differential threshold */ int rcb_abs_threshold; /* current node absolute threshold */ int rcb_num_slots; /* physmem slots */ } rcb_info_t; Physical memory in a node is not always contiguous, and therefore additional information is necessary to determine the counter buffer location associated with a physical page. Physical memory within a node is divided into a number of contiguous sections called "slots". The slot configuration for a node can be obtained using ioctl(fd, RCB_SLOT_GET, slotconfig), where slot config is of type rcb_slot_t defined in PPPPaaaaggggeeee 5555 rrrreeeeffffccccnnnntttt((((5555)))) rrrreeeeffffccccnnnntttt((((5555)))) <sys/SN/hwcntrs.h>. typedef struct rcb_slot { __uint64_t base; /* Base physical address for slot */ __uint64_t size; /* Size of slot in bytes */ } rcb_slot_t; CCCCAAAAVVVVEEEEAAAATTTTSSSS The reference counters when enabled can consume a considerable amount of memory space for the per-node reference tables. The reference counters are not virtualized. This means that if a process starts paging, or its pages start migrating, the counter set associated with a virtual page will change. The extended memory reference counters may be out of sync with the hardware reference counters by up to the hardware reference counter maximum count (2047 for 11-bit counters and 524287 for 19-bit counters). SSSSEEEEEEEE AAAALLLLSSSSOOOO For more information, see numa(5), mmci(5), proc(4), migration(5), sn(1), nstats(1) PPPPaaaaggggeeee 6666